Search Result

Select

Hot new word discovery applied for detection of network hot news

WANG Yu, XU Jianmin

Journal of Computer Applications 2020, 40 (12): 3513-3519. DOI: 10.11772/j.issn.1001-9081.2020040549

Abstract （540）

PDF （987KB）（425）

Save

By analyzing the characteristics of hot words in network news, a hot new word discovery method was proposed for detection of network hot news. Firstly, the Frequent Pattern tree (FP-tree) algorithm was improved to extract the frequent word strings as the hot new word candidates. A lot of useless information in the news data was reduced by deleting the infrequent 1-word strings from news data and cutting news data based on infrequent 1, 2-infrequent word strings, so as to greatly decrease the complexity of FP-tree. Secondly, the multivariant Pointwise Mutual Information (PMI)was formed by expanding the binary PMI, and the Time PMI (TPMI) was formed by introducing the time features of hot words. TPMI was used to judge the internal cohesion degree and timeliness of hot new word candidates, so as to remove the unqualified candidates. Finally, the branch entropy was used to determine the boundary of new words for selecting new hot words. The dataset formed by 7 222 news headlines collected from Baidu network news was used for the experiments. When the events reported at least 8 times in half a month were selected as hot news, and the adjustment coefficient of time feature was set 2, TPMI correctly recognized 51 hot words, missed 2 hot words because they were hot for a long time and 2 less-hot words because they occurred insufficiently; the multivariant PMI without time features correctly recognized all 55 hot words, but incorrectly recognized 97 non-hot words. It can be seen from the analysis that the time and space cost is reduced by decreasing the complexity of FP-tree, and experimental results show that the recognition rate of hot new words is improved by introducing time feature during the hot new word judgement.

Reference | Related Articles | Metrics

Select

New improved 1-2-order fractional differential edge detection model based on Riemann-Liouville integral

WANG Chengxiao, HUANG Huixian, YANG Hui, XU Jianmin

Journal of Computer Applications 2016, 36 (1): 227-232. DOI: 10.11772/j.issn.1001-9081.2016.01.0227

Abstract （461）

PDF （962KB）（391）

Save

Focusing on the issues of failing to pinpoint the edge information accurately and lacking texture detail of image by using integer order differential or 0-1-order fractional differential mask operators in digital image processing, a new 1-2-order edge detection operator based on Laplacian operator was proposed. Deduced from the definition of Riemann-Liouville (R-L),the 1-2-order fractional differential had the advantage in enhancing high-frequency signal and reinforcing medium frequency signal. The simulation results demonstrate that the proposed operator can take an higher recognition rate on the subjective recognition, and it's better at extracting the edge information, especially for the image with rich texture detail in the smooth region with little change of gray scale. Objectively, the integrated location error rate is 7.41% which is less than that of integer order differential operators (a minimum of 10.36%) and 0-1-order differential operator (a minimum of 9.97%). Quantitative indicators show the new fractional operator can effectively improve the positioning accuracy of the edge, and the proposed operator is particularly suitable for edge detection with high frequency information.

Reference | Related Articles | Metrics

Select

Dynamic model combining with time facter for event tracking

XU Jianmin SUN Xiaolei WU Guifang

Journal of Computer Applications 2013, 33 (10): 2807-2810.

Abstract （521）

PDF （775KB）（512）

Save

Concerning the Internet news tracking, the study put forward a dynamic model for event tracking with reference to the time information. The dynamic model introduced the time factor into the traditional vector model to get the time similarity of the same characteristic words between the document and the event,and then applied the time similarity to calculate the similarity of the document and the event.If a document was related to the event,the new characteristic words in the document would be added to the event term set,and the weight and time information of characteristic words in the event term set should be re-adjusted. The experiment was evaluated by Detection Error Tradeoff (DET), and the results show that the dynamic model for event tracking improves the system performance effectively, and its minimum normalized cost of tracking loss is reduced by about 9%.